8 - Subclustering

Author

CDN team

Published

June 11, 2024

Introduction

In this vignette we will examine methods for increasing resolution on cell subtypes and cell states. We will compare two methods: increasing resolution and other parameters to find more clusters, and subclustering. Subclustering is the process of clustering, subsetting to one cluster, then running the clustering pipeline again. In high-dimensional datasets, especially ones with lots of technical or biological noise, focusing on specific celltypes individually improves detection of subtype and state patterns. Highly variable gene selection and latent-space calculation are both affected by noise and outliers. Subclustering can also improve computational efficiency - finding small clusters can be expensive if working with the full dataset.

However, it’s important to keep in mind that iterative subclustering can lead to “overfitting” the data. This means we might identify noise as clusters, and we will have to contend more with the “curse of dimensionality” in downstream analysis. We should always validate our clusters according to expression of marker genes, use technical replicates, bootstrapping methods, or check their existence in external datasets.

Vocabulary

Subclustering

The process of dividing a previously identified cluster into smaller, more detailed clusters (subclusters). Subclustering is used to uncover finer, often subtle distinctions within a dataset that might not be visible in an initial analysis.

Overfitting

Overfitting is when an analyst describes random noise in the data rather than underlying general relationships. Overfit models perform well on their dataset, but very poorly on other data.

Curse of Dimensionality

The term data analysts use to describe phenomena that appear when analyzing data in high-dimensional spaces. As dimensionality increases, the data can become sparse. Sparsity is problematic for statistical significance testing. Additionally, by increasing dimensions, we increase the number of false positives when using p-value thresholds.

Parameter scan

AKA parameter sweep, this is the process of systematically varying parameters in an algorithm to analyze the effects of their changes on the outcome. Parameter scans are widely used in computationaly biology to identify optimal parameters or test the stability of our models.

Key Takeaways

  • Recomputing highly variable genes at each subclustering step resets the biological universe we are looking at to the capture the “new” sources of variability.

  • Iterative subclustering is essential to uncover fine grained populations

  • In addition to finding fine-grained populations, subclustering can help create better divisions between the different “species” of celltypes and subtypes.

Libraries

### Make sure all the packages are installed
if (!requireNamespace("Seurat", quietly = TRUE))
    install.packages("Seurat")

if (!requireNamespace("tidyverse", quietly = TRUE))
    install.packages("tidyverse")

if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

if (!requireNamespace("colorBlindness", quietly = TRUE))
    install.packages("colorBlindness")

if (!requireNamespace("DT", quietly = TRUE))
    install.packages("DT")    

if (!requireNamespace("scales", quietly = TRUE))
    install.packages("scales") 

if (!requireNamespace("tictoc", quietly = TRUE))
    install.packages("tictoc") 

if (!requireNamespace("ggalluvial", quietly = TRUE))
    install.packages("ggalluvial") 

### Load all the necessary libraries
library(Seurat)
library(tidyverse)
library(devtools)
library(colorBlindness)
library(DT)
library(scales)
library(RColorBrewer)
library(scales)
library(tictoc)
library(ggalluvial)

set.seed(687)

Load data

We’re going to be working with a dataset from the paper - Immunophenotyping of COVID-19 and influenza highlights the role of type I interferons in development of severe COVID-19 Download data from the cellxgene portal.

se <- readRDS("../data/se_lvl1.rds")

Color palette

# Set color palette for cell types
pal <- paletteMartin
names(pal) <- sort(unique(se$Celltype))

donor_pal <- c(
    "#66C2A4", "#41AE76", "#238B45", "#006D2C",
    "#41B6C4", "#1D91C0", "#225EA8", "#253494", "#081D58",
    "#FFEDA0", "#FED976", "#FEB24C", "#f79736", "#FD8D3C",
    "#FC4E2A", "#E31A1C", "#BD0026", "#ad393b", "#800000", "#800050")

names(donor_pal) <- c(
    "Normal 1", "Normal 2", "Normal 3", "Normal 4",
    "Flu 1", "Flu 2", "Flu 3", "Flu 4", "Flu 5",
    "nCoV 1", "nCoV 2", "nCoV 3", "nCoV 4", "nCoV 5",
    "nCoV 6", "nCoV 7", "nCoV 8", "nCoV 9", "nCoV 10", "nCoV 11"  
)

To get up to speed with the previous worksheets, process the data in the same way.

se <- se %>%
    NormalizeData(verbose = FALSE) %>%
    FindVariableFeatures(
        method = "vst",
        nfeatures = 3000,
        verbose = FALSE) %>%
    ScaleData(verbose = FALSE, features = VariableFeatures(.))

Let’s extract the top 3000 HVGs from the whole data across all cell types.

hvg_full <- VariableFeatures(se)

Analysis

Finding rare celltypes

For many of us, our first idea for finding rare celltypes is to modulate the parameters of what we have already to find the right celltypes. As we saw in the previous clustering notebook, we can find NK and T cell clusters this way, but it still seems like there’s some heterogeneity in the clusters we’ve found. For illustration, I’ve run a parameter scan on the following variables:

  • FindVariableFeatures nfeatures
  • RunPCA npcs,
  • FindNeighbors k.param,
  • FindClusters resolution
parameter_df <- expand.grid(
  nf = c(2000, 3000, 5000),
  pc = c(20, 30, 50),
  k = c(10, 30, 50),
  res = c(0.5, 0.8, 1.2)
)

paramscan <- readRDS('../data/covid_flu_srobj_clusters_paramscan.rds')
paramscan <- bind_cols(paramscan)

row.names(paramscan) <- colnames(se)

paramscan_long <- paramscan %>%
  rownames_to_column(var = "cell_id") %>%
  pivot_longer(
    cols = -cell_id,
    names_to = "parameter",
    values_to = "cluster"
  ) %>%
  mutate(
    nf = as.numeric(gsub(".*nf(\\d+)_.*", "\\1", parameter)),
    pc = as.numeric(gsub(".*pc(\\d+)_.*", "\\1", parameter)),
    k = as.numeric(gsub(".*k(\\d+)_.*", "\\1", parameter)),
    res = as.numeric(gsub(".*res(\\d+\\.\\d+).*", "\\1", parameter))
  ) %>% 
  group_by(parameter) %>% 
  mutate(
    n_clusts = max(as.numeric(cluster))
  ) %>%
  select(-parameter)

ggplot(paramscan_long %>% 
        select(-cell_id, -cluster) %>% 
        distinct, aes(x = factor(pc), 
        y = n_clusts, 
        fill = factor(k))) +
  geom_bar(stat='identity') +
  facet_grid(paste('k=',k) + paste('nf =',nf) ~ paste('resolution =',res)) +
  scale_y_continuous(labels = scales::label_number()) +
  scale_fill_manual(values = unname(donor_pal[c(1,6,11)])) +
  labs(
      title = "Number of clusters Across Parameters",
      x = "Clustering resolution + nPCs",
      y = "Number of clusters",
      fill = "k param") +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

Take a close look at the results. When we modulate nfeatures or nPCs, we don’t directly see changes in the number of celltypes found. But, like we saw in the previous sessions, modulating the k.param and the clustering resolution have outsized effects on the number of clusters found.

But if we’re looking for a rare celltype, we must include more information, right? It makes most sense to increase both the nfeatures, nPCs, AND clustering resolution to find that celltype - because we need to make sure we are including the genes that define the celltype, and making small enough clusters to be able to find it.

So let’s take a closer look at what changes in the PCA latent spaces. Maybe the information we need is there.

Comparing high and low parameterized PCA spaces

Here we will compare latent spaces between a low-parameter PCA and a high-parameter PCA. In the low, 2000 features and 20 PCs. In the high, 5000 features and 50 PCs. To keep a rational mind about the subject, let’s time our processing runs to see how much more resources the high-param method needs.

tic()
se <- se %>%
    FindVariableFeatures(
            method = "vst",
            nfeatures = 2000,
            verbose = FALSE) %>%
        ScaleData(verbose = FALSE, features = VariableFeatures(.)) %>% 
        RunPCA(
            npcs = 20,
            verbose = FALSE
        )
toc()
19.463 sec elapsed
latent_l <- se@reductions$pca@cell.embeddings
loadings_l <- se@reductions$pca@feature.loadings
var_l <- se@reductions$pca@stdev
t_l <- se@reductions$pca@misc$total.variance

tic()
se <- se %>%
    FindVariableFeatures(
            method = "vst",
            nfeatures = 5000,
            verbose = FALSE) %>%
        ScaleData(verbose = FALSE, features = VariableFeatures(.)) %>% 
        RunPCA(
            npcs = 50,
            verbose = FALSE
        )
toc()
76.681 sec elapsed
latent_h <- se@reductions$pca@cell.embeddings
loadings_h <- se@reductions$pca@feature.loadings
var_h <- se@reductions$pca@stdev
t_h <- se@reductions$pca@misc$total.variance

var_l_df <- data.frame(PC = 1:length(var_l), Variance = var_l^2 / t_l, Set = "2000 Features")
var_h_df <- data.frame(PC = 1:length(var_h), Variance = var_h^2 / t_h, Set = "5000 Features")

# Calculate cumulative variance explained
var_l_df$CumulativeVariance <- cumsum(var_l_df$Variance)
var_h_df$CumulativeVariance <- cumsum(var_h_df$Variance)

# Combine the datasets
combined_variance <- bind_rows(var_l_df, var_h_df)

# Plotting the variance explained and cumulative variance
ggplot(combined_variance, aes(x = PC)) +
    geom_line(aes(y = CumulativeVariance, color = Set), size = 1.2) +
    geom_point(aes(y = CumulativeVariance, color = Set), size = 2) +
    scale_color_manual(values = c("blue", "red")) +
    scale_linetype_manual(values = c("solid", "dashed")) +
    theme_minimal() +
    labs(title = "Cumulative Variance Explained",
         x = "Principal Component",
         y = "Proportion of Variance Explained") +
    guides(color = guide_legend(title = "Parameter Set"), 
           linetype = guide_legend(title = "Variance Type"))

Superficially, it looks like we are capturing more of the variance with our 2k features + 20PCs! But keep in mind if we only include 2000 features, we have much lower total variance. What we can see here is that the elbow comes sooner with the higher-parameterized set.

# Ensure both matrices have the same genes in the same order
common_genes <- intersect(rownames(loadings_l), rownames(loadings_h))
loadings_l_aligned <- loadings_l[common_genes, , drop = FALSE]
loadings_h_aligned <- loadings_h[common_genes, , drop = FALSE]

# Determine high loadings
cutoff_l <- quantile(abs(as.matrix(loadings_l_aligned)), 0.95)
cutoff_h <- quantile(abs(as.matrix(loadings_h_aligned)), 0.95)

high_loadings_l <- apply(abs(loadings_l_aligned), 1, max) > cutoff_l
high_loadings_h <- apply(abs(loadings_h_aligned), 1, max) > cutoff_h

# Identify features that are high in _h but not high in _l
target_features <- names(which(high_loadings_h & !high_loadings_l))

# Prepare loadings of these target features for plotting
target_loadings <- loadings_h_aligned[target_features, , drop = FALSE]

# Convert to long format for ggplot using pivot_longer
loadings_long <- as.data.frame(target_loadings) %>%
  tibble::rownames_to_column("Feature") %>%
  pivot_longer(
    cols = -Feature,
    names_to = "PC",
    values_to = "Loading"
  )

compare_pca_loadings <- function(loadings1, loadings2, top_n = 20, cell_type_markers = NULL,
                                  loadings1_name = deparse(substitute(loadings1)), 
                                  loadings2_name = deparse(substitute(loadings2))) {
    all_genes <- union(rownames(loadings1), rownames(loadings2))

    # Initialize matrices to store aligned loadings with genes set to 0 by default
    loadings1_aligned <- matrix(0, nrow = length(all_genes), ncol = ncol(loadings1), 
                                dimnames = list(all_genes, colnames(loadings1)))
    loadings2_aligned <- matrix(0, nrow = length(all_genes), ncol = ncol(loadings2), 
                                dimnames = list(all_genes, colnames(loadings2)))

    # Fill in the existing values from each matrix
    loadings1_aligned[rownames(loadings1), ] <- loadings1
    loadings2_aligned[rownames(loadings2), ] <- loadings2

    # Check for zero values immediately after filling in values
    loadings1_zero <- apply(loadings1_aligned, 1, function(x) all(x == 0))
    loadings2_zero <- apply(loadings2_aligned, 1, function(x) all(x == 0))

    outlines <- character(length(all_genes))
    outlines[loadings1_zero] <- "orange"
    outlines[loadings2_zero] <- "blue"
    # outlines[loadings1_zero & loadings2_zero] <- "white" # Both are zero

    max_loadings1 <- apply(abs(loadings1_aligned), 1, max)
    max_loadings2 <- apply(abs(loadings2_aligned), 1, max)

    # Calculate the differences between the maximum absolute loadings across all PCs
    loadings_difference <- max_loadings2 - max_loadings1

    # Determine the direction of the difference
    directionality <- ifelse(loadings_difference > 0, paste("Higher in", loadings2_name), paste("Higher in", loadings1_name))

    fillpal <- c('steelblue','salmon')
    names(fillpal) = c(paste("Higher in", loadings2_name), paste("Higher in", loadings1_name))

    # Create a data frame for plotting
    genes_differences_df <- data.frame(
        Feature = rownames(loadings1_aligned),  # Assuming both matrices have the same rownames
        Difference = loadings_difference,
        Direction = directionality,
        Outline = outlines[match(rownames(loadings1_aligned), all_genes)]
    )

    genes_differences_df <- genes_differences_df %>% mutate(
                MaxAbsLoadings1 = ifelse(Direction == names(fillpal)[1], -max_loadings1, max_loadings1),
                MaxAbsLoadings2 = ifelse(Direction == names(fillpal)[2], -max_loadings2, max_loadings2))

    if (!is.null(cell_type_markers)) {
        marker_genes <- unlist(cell_type_markers, use.names = FALSE)
        genes_differences_df <- genes_differences_df %>%
            dplyr::filter(Feature %in% marker_genes) %>%
            dplyr::mutate(CellType = NA)  # Assign NA initially

        for (cell_type in names(cell_type_markers)) {
            genes_differences_df$CellType[genes_differences_df$Feature %in% cell_type_markers[[cell_type]]] <- cell_type
        }
        
        
        plot <- ggplot(genes_differences_df, aes(x = reorder(Feature, Difference), y = Difference, fill = Direction)) +
            geom_col() +
            coord_flip() +
            theme_minimal() +
            scale_fill_manual(values = fillpal) +
            labs(title = "Gene Loadings Differences by Cell Type",
                 subtitle = paste("Comparing", loadings1_name, "and", loadings2_name),
                 x = "Gene",
                 y = "Difference in Loadings") +
            guides(fill = guide_legend(title = "Where Loadings are Higher")) +
            facet_wrap(~CellType, scales = "free_y", ncol = 1, drop = TRUE)
    } else {
        genes_differences_df <- genes_differences_df %>%
            dplyr::arrange(desc(Difference)) %>%
            dplyr::slice(1:top_n)

        plot <- ggplot(genes_differences_df, aes(x = reorder(Feature, Difference), y = Difference, fill = Direction)) +
            geom_col(show.legend = FALSE) +
            scale_color_manual(name = "Outline Color", values = c("green" = "green", "blue" = "blue", "purple" = "purple"), labels = c(paste("Zero in", loadings1_name), paste("Zero in", loadings2_name), "Zero in Both")) +
            coord_flip() +
            theme_minimal() +
            scale_fill_manual(values = fillpal) +
            labs(title = paste("Top", top_n, "Genes with Greatest Differences in Loadings"),
                subtitle = paste("Comparing", loadings1_name, "and", loadings2_name),
                x = "Gene",
                y = "Difference in Loadings") +
            guides(fill = guide_legend(title = "Where Loadings are Higher"))
    }

    return(plot)
}

# Example usage:
compare_pca_loadings(loadings_l, loadings_h, top_n = 40)

Many of these genes are trained immunity genes. At this point we could keep some rare positive-control celltype in mind, like ILC3s or something else we might expect to find in this dataset

# Define a list of cell type markers
cell_type_markers <- list(
  ILC3s = c("IL7RA", "PTGDR2", "NCR2", "CCR6", "RORC"),
  NKTfh = c("CD1D", "CXCR5", "PDCD1", "ICOS", "BCL6"),
  Plasmablasts = c("CD38" = "CD38", "CD27" = "CD27", "CD319" = "CD319", "IRF4" = "IRF4", "XBP1" = "XBP1"),
  MDSCs = c("CD33" = "CD33", "CD11b" = "ITGAM", "S100A8" = "S100A8", "S100A9" = "S100A9", "ARG1" = "ARG1")
)

compare_pca_loadings(loadings_l, loadings_h, cell_type_markers = cell_type_markers)

Comparison to subclustering

To do this we will focus on the T cell compartment.

tse <- se[, se$annotation_lvl1 == "T cells"]

First let’s process our object with the HVG obtained from the whole dataset

tse_full <- tse %>%
    NormalizeData() %>%
    ScaleData(
        verbose = FALSE,
        features = hvg_full) %>% 
    RunPCA(
        features = hvg_full,
        npcs = 20,
        verbose = FALSE) %>%
    FindNeighbors() %>%
    FindClusters(resolution = c(0.05, 0.1, 0.15, 0.2))
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 26261
Number of edges: 811889

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9683
Number of communities: 4
Elapsed time: 3 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 26261
Number of edges: 811889

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9544
Number of communities: 5
Elapsed time: 3 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 26261
Number of edges: 811889

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9412
Number of communities: 7
Elapsed time: 2 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 26261
Number of edges: 811889

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9322
Number of communities: 10
Elapsed time: 3 seconds
# Visualize these clusters
DimPlot(
    tse_full,
    reduction = 'umap',
    group.by = c("RNA_snn_res.0.05", "RNA_snn_res.0.1",
                 "RNA_snn_res.0.15", "RNA_snn_res.0.2"))

dim_full <- DimPlot(
    tse_full,
    reduction = 'umap',
    group.by = "RNA_snn_res.0.15")

Now let’s recompute the HVG for the

tse_sub <- tse %>%
    FindVariableFeatures(
        method = "vst",
        nfeatures = 3000,
        verbose = FALSE) %>%
    ScaleData(verbose = FALSE, features = VariableFeatures(.)) %>% 
    RunPCA(
        npcs = 20,
        verbose = FALSE
    ) %>%
    FindNeighbors() %>%
    FindClusters(resolution = c(0.05, 0.1, 0.15, 0.2)) %>%
    RunUMAP(dims = 1:20)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 26261
Number of edges: 814437

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9660
Number of communities: 4
Elapsed time: 4 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 26261
Number of edges: 814437

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9502
Number of communities: 5
Elapsed time: 3 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 26261
Number of edges: 814437

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9381
Number of communities: 8
Elapsed time: 3 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 26261
Number of edges: 814437

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9287
Number of communities: 10
Elapsed time: 3 seconds
# Visualize these clusters
DimPlot(
    tse_sub,
    reduction = 'umap',
    group.by = c("RNA_snn_res.0.05", "RNA_snn_res.0.1",
                 "RNA_snn_res.0.15", "RNA_snn_res.0.2"))

dim_sub <- DimPlot(
    tse_sub,
    reduction = 'umap',
    group.by = "RNA_snn_res.0.15")

Right off the bat, how do these UMAPs compare to each other

dim_full + dim_sub

What you’re probably wondering first is, hey would I have found these clusters if I had just increased the number of clusters I used?

data_alluvial <- data.frame(
    bc = colnames(tse_full),
    full_hvg = tse_full$RNA_snn_res.0.15,
    sub_hvg = tse_sub[, colnames(tse_full)]$RNA_snn_res.0.15) %>% 
    dplyr::count(full_hvg, sub_hvg)

ggplot(data = data_alluvial,
       aes(axis1 = full_hvg, axis2 = sub_hvg, y = n)) +
    geom_alluvium(aes(fill = full_hvg), width = 0.1) +
    geom_stratum(width = 0.1) +
    geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
    theme_minimal() +
    labs(title = "Alluvial plot of Clustering using full data HVG or  subcluster HVG",
         x = "Cluster assignment",
         y = "N")

Both clustering resolutions seem to be very similar but using specific HVG we are able to detect one more cluster at the same resolution and, in theory, the clusters should be more specific.

If we check for example the overlap of highly variable features between the two methods, we’ll find there are very few overlapping

table(VariableFeatures(tse_sub) %in% hvg_full)

FALSE  TRUE 
 1064  1936 
head(setdiff(VariableFeatures(tse_sub),  hvg_full), n = 50)
 [1] "BTK"           "PTMA"          "UQCRH"         "CLIC4"         "MBOAT2"        "COL9A3"        "MSRB3"         "MAP7"          "JUP"           "TIMM13"        "NDUFS6"        "SLC25A3"       "SLC45A3"       "MTDH"          "MRPL52"        "FKBP1A"        "PSMB3"         "HNRNPD"        "ZNF706"        "ATP5MF"        "YBX3"          "XRCC5"         "HDGF"          "ACTR3"         "SAPCD2"        "ATP5PF"        "COX7B"         "ANP32A"        "ATP5F1C"       "CHCHD2"        "SRSF3"         "VCP"           "SSR1"          "LCA5"          "SMS"           "ERH"           "CFL1"          "RPLP0"         "UQCRC1"        "CALM3"         "TIMM8B"        "NDUFAB1"       "PSMA7"         "COX8A"         "NCOA3"         "RP11-281P23.1" "EIF5A"         "UQCRQ"         "GAS2L3"        "PSMB8"        

We can see how by recomputing the HVG we replace 33% of the HVG genes which capture the variability present in the T cell subset.

Annotation playground

Let’s find the marker genes of this new clustering

Idents(tse_sub) <- tse_sub$RNA_snn_res.0.15
mgs <- FindAllMarkers(
    tse_sub,
    logfc.threshold = 0.25,
    only.pos = TRUE,
    min.pct = 0.25)

DT::datatable(mgs, filter = "top")

Do you think you can annotate these new clusters….

Cluster 1

Let’s look at genes that are differentially expressed

FeaturePlot(
    tse_sub,
    features = c("CD3D", "CD3E", "TRAC",
                 "TRBC1", "TRBC2", "TRDC",
                 "TRGC1", "TRGC2"),
    ncol = 3) +
    dim_sub

VlnPlot(
    tse_sub,
    features = c("CD3D", "CD3E", "TRAC",
                 "TRBC1", "TRBC2", "TRDC",
                 "TRGC1", "TRGC2"),
    group.by = "RNA_snn_res.0.05") +
    dim_sub

….

Annotate

tse_sub@meta.data <- tse_sub@meta.data %>%
  dplyr::mutate(
    annotation_lvl2 = dplyr::case_when(
      RNA_snn_res.0.15 == 0 ~ "",
      RNA_snn_res.0.15 == 1 ~ "",
      RNA_snn_res.0.15 == 2 ~ "",
      RNA_snn_res.0.15 == 3 ~ "",
      RNA_snn_res.0.15 == 4 ~ "",
      RNA_snn_res.0.15 == 5 ~ "",
      RNA_snn_res.0.15 == 6 ~ "",
      RNA_snn_res.0.15 == 7 ~ ""
      )
  )

DimPlot(tse_sub, group.by = "annotation_lvl2")

Session Info

sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggalluvial_0.12.5    tictoc_1.2.1         RColorBrewer_1.1-3   scales_1.3.0         DT_0.33              colorBlindness_0.1.9 devtools_2.4.5       usethis_2.2.3        lubridate_1.9.3      forcats_1.0.0        stringr_1.5.1        dplyr_1.1.4          purrr_1.0.2          readr_2.1.5          tidyr_1.3.1          tibble_3.2.1         ggplot2_3.5.1        tidyverse_2.0.0      Seurat_5.1.0         SeuratObject_5.0.2   sp_2.1-4            

loaded via a namespace (and not attached):
  [1] jsonlite_1.8.8         magrittr_2.0.3         spatstat.utils_3.0-4   farver_2.1.2           rmarkdown_2.27         fs_1.6.4               vctrs_0.6.5            ROCR_1.0-11            memoise_2.0.1          spatstat.explore_3.2-7 htmltools_0.5.8.1      gridGraphics_0.5-1     sass_0.4.9             sctransform_0.4.1      parallelly_1.37.1      bslib_0.7.0            KernSmooth_2.23-24     htmlwidgets_1.6.4      ica_1.0-3              plyr_1.8.9             plotly_4.10.4          zoo_1.8-12             cachem_1.1.0           igraph_2.0.3           mime_0.12              lifecycle_1.0.4        pkgconfig_2.0.3        Matrix_1.7-0           R6_2.5.1               fastmap_1.2.0          fitdistrplus_1.1-11    future_1.33.2          shiny_1.8.1.1          digest_0.6.35          colorspace_2.1-0       patchwork_1.2.0        tensor_1.5             RSpectra_0.16-1        irlba_2.3.5.1          pkgload_1.3.4          crosstalk_1.2.1        labeling_0.4.3         progressr_0.14.0       timechange_0.3.0       fansi_1.0.6            spatstat.sparse_3.0-3  httr_1.4.7             polyclip_1.10-6        abind_1.4-5            compiler_4.4.0         remotes_2.5.0         
 [52] withr_3.0.0            fastDummies_1.7.3      pkgbuild_1.4.4         MASS_7.3-60.2          sessioninfo_1.2.2      tools_4.4.0            lmtest_0.9-40          httpuv_1.6.15          future.apply_1.11.2    goftest_1.2-3          glue_1.7.0             nlme_3.1-165           promises_1.3.0         grid_4.4.0             Rtsne_0.17             cluster_2.1.6          reshape2_1.4.4         generics_0.1.3         gtable_0.3.5           spatstat.data_3.0-4    tzdb_0.4.0             hms_1.1.3              data.table_1.15.4      utf8_1.2.4             spatstat.geom_3.2-9    RcppAnnoy_0.0.22       ggrepel_0.9.5          RANN_2.6.1             pillar_1.9.0           limma_3.60.2           spam_2.10-0            RcppHNSW_0.6.0         later_1.3.2            splines_4.4.0          lattice_0.22-6         survival_3.7-0         deldir_2.0-4           tidyselect_1.2.1       miniUI_0.1.1.1         pbapply_1.7-2          knitr_1.47             gridExtra_2.3          scattermore_1.2        xfun_0.44              statmod_1.5.0          matrixStats_1.3.0      stringi_1.8.4          lazyeval_0.2.2         yaml_2.3.8             evaluate_0.24.0        codetools_0.2-20      
[103] cli_3.6.2              uwot_0.2.2             xtable_1.8-4           reticulate_1.37.0      jquerylib_0.1.4        munsell_0.5.1          Rcpp_1.0.12            globals_0.16.3         spatstat.random_3.2-3  png_0.1-8              parallel_4.4.0         ellipsis_0.3.2         presto_1.0.0           dotCall64_1.1-1        profvis_0.3.8          urlchecker_1.0.1       listenv_0.9.1          viridisLite_0.4.2      ggridges_0.5.6         leiden_0.4.3.1         rlang_1.1.4            cowplot_1.1.3